DiscoverHuggingFace 每日AI论文速递2025.10.13 | 桌面交互预训练解锁机器人潜能;统一模型赋予相机空间想象力
2025.10.13 | 桌面交互预训练解锁机器人潜能;统一模型赋予相机空间想象力

2025.10.13 | 桌面交互预训练解锁机器人潜能;统一模型赋予相机空间想象力

Update: 2025-10-13
Share

Description

本期的 14 篇论文如下:

[00:20 ] 🖥 D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI(D2E:利用桌面数据规模化视觉-动作预训练以迁移至具身智能)

[01:13 ] 📷 Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation(基于相机的统一多模态理解与生成模型)

[01:56 ] 🎨 TAG:Tangential Amplifying Guidance for Hallucination-Resistant Diffusion Sampling(TAG:抑制幻觉的扩散采样切向放大引导)

[02:31 ] 🧠 Multimodal Prompt Optimization: Why Not Leverage Multiple Modalities for MLLMs(多模态提示优化:为何不为多模态大模型释放全模态潜能)

[03:05 ] 🚀 AutoPR: Let's Automate Your Academic Promotion!(AutoPR:让学术晋升一键自动化!)

[03:39 ] 🧭 R-Horizon: How Far Can Your Large Reasoning Model Really Go in Breadth and Depth?(R-HORIZON:你的大推理模型在广度与深度上究竟能走多远?)

[04:14 ] 🚀 Webscale-RL: Automated Data Pipeline for Scaling RL Data to Pretraining Levels(Webscale-RL:把强化学习数据扩展到预训练体量的自动化流水线)

[04:56 ] 🛰 SpaceVista: All-Scale Visual Spatial Reasoning from mm to km(SpaceVista:毫米到千米全尺度视觉空间推理)

[05:37 ] 🎥 StreamingVLM: Real-Time Understanding for Infinite Video Streams(StreamingVLM:面向无限视频流的实时理解框架)

[06:19 ] 🌐 KORMo: Korean Open Reasoning Model for Everyone(KORMo:人人可用的韩语开放推理模型)

[06:42 ] ♻ Don't Waste Mistakes: Leveraging Negative RL-Groups via Confidence Reweighting(别浪费错误:通过置信度加权利用负RL组)

[07:25 ] 🧠 Bridging Reasoning to Learning: Unmasking Illusions using Complexity Out of Distribution Generalization(从推理到学习的桥梁:以复杂度分布外泛化揭穿幻觉)

[08:16 ] ⚡ DISCO: Diversifying Sample Condensation for Efficient Model Evaluation(DISCO:以模型分歧为导向的样本浓缩加速评测)

[08:56 ] 🚗 Progressive Gaussian Transformer with Anisotropy-aware Sampling for Open Vocabulary Occupancy Prediction(面向开放词汇占用预测的各向异性采样渐进高斯Transformer)

<figure></figure>

【关注我们】

您还可以在以下平台找到我们,获得播客内容以外更多信息

小红书: AI速递

Comments 
In Channel
loading
00:00
00:00
x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

2025.10.13 | 桌面交互预训练解锁机器人潜能;统一模型赋予相机空间想象力

2025.10.13 | 桌面交互预训练解锁机器人潜能;统一模型赋予相机空间想象力